This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task.
translated by 谷歌翻译
自动化机器学习(AUTOML)是使机器学习模型被广泛应用于解决现实世界问题的重要步骤。尽管有许多研究的进步,但机器学习方法主要由于其数据隐私和安全法规而尚未完全被行业利用,因此在中心位置存储和计算增加数据量的高成本以及最重要的是缺乏专业知识。因此,我们介绍了一个新颖的框架,hanf -$ \ textbf {h} $ yperparameter $ \ textbf {a} $ nd $ \ textbf {n} $ earural架构搜索$ \ textbf {f}为在几个数据所有者服务器上分布的数据建立一个自动框架,而无需将数据带到中心位置。 HANF使用基于梯度的神经体系结构搜索和数据分布式设置中分别使用基于梯度的神经体系结构搜索和$ n $ armed Bandit方法来共同优化学习算法的神经体系结构和非构造超参数。我们表明,HANF有效地找到了优化的神经体系结构,并在数据所有者服务器上调整了超参数。此外,HANF可以在联合和非填充设置中应用。从经验上讲,我们表明HANF使用图像分类任务收敛于合适的体系结构和非架构高参数集。
translated by 谷歌翻译
模拟在机器学习中无处不在。特别是在图形学习中,正在部署定向无环图(DAG)的模拟以评估新算法。在文献中,最近有人认为,诸如宣传之类的结构发现的连续优化方法可能正在利用该变量在可用数据中的可分解性,因为它们使用了最小的正方形损失。具体而言,由于结构发现是科学及其他方面的关键问题,因此我们希望对用于测量数据的量表不变(例如,仪表和厘米不应影响算法推断出的因果方向)。在这项工作中,我们通过证明在多变量案例中的关键结果并通过进一步的经验证据来进一步加强了这一初始的,负面的经验建议。特别是,我们表明我们可以通过目标方差攻击来控制所得图,即使在我们只能部分操纵数据方差的情况下。
translated by 谷歌翻译
The analysis of network structure is essential to many scientific areas, ranging from biology to sociology. As the computational task of clustering these networks into partitions, i.e., solving the community detection problem, is generally NP-hard, heuristic solutions are indispensable. The exploration of expedient heuristics has led to the development of particularly promising approaches in the emerging technology of quantum computing. Motivated by the substantial hardware demands for all established quantum community detection approaches, we introduce a novel QUBO based approach that only needs number-of-nodes many qubits and is represented by a QUBO-matrix as sparse as the input graph's adjacency matrix. The substantial improvement on the sparsity of the QUBO-matrix, which is typically very dense in related work, is achieved through the novel concept of separation-nodes. Instead of assigning every node to a community directly, this approach relies on the identification of a separation-node set, which -- upon its removal from the graph -- yields a set of connected components, representing the core components of the communities. Employing a greedy heuristic to assign the nodes from the separation-node sets to the identified community cores, subsequent experimental results yield a proof of concept. This work hence displays a promising approach to NISQ ready quantum community detection, catalyzing the application of quantum computers for the network structure analysis of large scale, real world problem instances.
translated by 谷歌翻译
Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.
translated by 谷歌翻译
In the era of noisy intermediate scale quantum devices, variational quantum circuits (VQCs) are currently one of the main strategies for building quantum machine learning models. These models are made up of a quantum part and a classical part. The quantum part is given by a parametrization $U$, which, in general, is obtained from the product of different quantum gates. By its turn, the classical part corresponds to an optimizer that updates the parameters of $U$ in order to minimize a cost function $C$. However, despite the many applications of VQCs, there are still questions to be answered, such as for example: What is the best sequence of gates to be used? How to optimize their parameters? Which cost function to use? How the architecture of the quantum chips influences the final results? In this article, we focus on answering the last question. We will show that, in general, the cost function will tend to a typical average value the closer the parameterization used is from a $2$-design. Therefore, the closer this parameterization is to a $2$-design, the less the result of the quantum neural network model will depend on its parametrization. As a consequence, we can use the own architecture of the quantum chips to defined the VQC parametrization, avoiding the use of additional swap gates and thus diminishing the VQC depth and the associated errors.
translated by 谷歌翻译
Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.
translated by 谷歌翻译
Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems for real-time context queries. This paper proposes a reinforcement learning based approach to adaptively cache context with the objective of minimizing the cost incurred by context management systems in responding to context queries. Our novel algorithms enable context queries and sub-queries to reuse and repurpose cached context in an efficient manner. This approach is distinctive to traditional data caching approaches by three main features. First, we make selective context cache admissions using no prior knowledge of the context, or the context query load. Secondly, we develop and incorporate innovative heuristic models to calculate expected performance of caching an item when making the decisions. Thirdly, our strategy defines a time-aware continuous cache action space. We present two reinforcement learning agents, a value function estimating actor-critic agent and a policy search agent using deep deterministic policy gradient method. The paper also proposes adaptive policies such as eviction and cache memory scaling to complement our objective. Our method is evaluated using a synthetically generated load of context sub-queries and a synthetic data set inspired from real world data and query samples. We further investigate optimal adaptive caching configurations under different settings. This paper presents, compares, and discusses our findings that the proposed selective caching methods reach short- and long-term cost- and performance-efficiency. The paper demonstrates that the proposed methods outperform other modes of context management such as redirector mode, and database mode, and cache all policy by up to 60% in cost efficiency.
translated by 谷歌翻译
State-of-the-art poetry generation systems are often complex. They either consist of task-specific model pipelines, incorporate prior knowledge in the form of manually created constraints or both. In contrast, end-to-end models would not suffer from the overhead of having to model prior knowledge and could learn the nuances of poetry from data alone, reducing the degree of human supervision required. In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We identify and address lack of training data and mismatching tokenization algorithms as possible limitations of past attempts. In particular, we successfully pre-train and release ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans. In addition, we analyze its runtime performance and introspect the model's understanding of style conditions. We make our code, models, and datasets publicly available.
translated by 谷歌翻译
Overfitting is a problem in Convolutional Neural Networks (CNN) that causes poor generalization of models on unseen data. To remediate this problem, many new and diverse data augmentation methods (DA) have been proposed to supplement or generate more training data, and thereby increase its quality. In this work, we propose a new data augmentation algorithm: VoronoiPatches (VP). We primarily utilize non-linear recombination of information within an image, fragmenting and occluding small information patches. Unlike other DA methods, VP uses small convex polygon-shaped patches in a random layout to transport information around within an image. Sudden transitions created between patches and the original image can, optionally, be smoothed. In our experiments, VP outperformed current DA methods regarding model variance and overfitting tendencies. We demonstrate data augmentation utilizing non-linear re-combination of information within images, and non-orthogonal shapes and structures improves CNN model robustness on unseen data.
translated by 谷歌翻译